Goto

Collaborating Authors

 Waller County


World's largest steam locomotive heads out on tour

Popular Science

Technology Engineering World's largest steam locomotive heads out on tour Union Pacific's Big Boy No. 4014 will travel coast-to-coast in commemoration of the semiquincentennial. Breakthroughs, discoveries, and DIY tips sent six days a week. The world's largest operating steam locomotive is hitting the road--or tracks . Union Pacific's Big Boy No. 4014 is heading out on its first coast-to-coast steam tour to celebrate the United States' 250th anniversary . The first leg begins on March 29, when Big Boy and other historical passenger cars from Union Pacific's Heritage Fleet will travel from the locomotive's home base in Cheyenne, Wyoming, west towards California.


Ensemble BERT for Medication Event Classification on Electronic Health Records (EHRs)

Sarker, Shouvon, Dong, Xishuang, Qian, Lijun

arXiv.org Artificial Intelligence

Identification of key variables such as medications, diseases, relations from health records and clinical notes has a wide range of applications in the clinical domain. n2c2 2022 provided shared tasks on challenges in natural language processing for clinical data analytics on electronic health records (EHR), where it built a comprehensive annotated clinical data Contextualized Medication Event Dataset (CMED). This study focuses on subtask 2 in Track 1 of this challenge that is to detect and classify medication events from clinical notes through building a novel BERT-based ensemble model. It started with pretraining BERT models on different types of big data such as Wikipedia and MIMIC. Afterwards, these pretrained BERT models were fine-tuned on CMED training data. These fine-tuned BERT models were employed to accomplish medication event classification on CMED testing data with multiple predictions. These multiple predictions generated by these fine-tuned BERT models were integrated to build final prediction with voting strategies. Experimental results demonstrated that BERT-based ensemble models can effectively improve strict Micro-F score by about 5% and strict Macro-F score by about 6%, respectively.


Robotic Multimodal Data Acquisition for In-Field Deep Learning Estimation of Cover Crop Biomass

Johnson, Joe, Chalasani, Phanender, Shah, Arnav, Ray, Ram L., Bagavathiannan, Muthukumar

arXiv.org Artificial Intelligence

Accurate weed management is essential for mitigating significant crop yield losses, necessitating effective weed suppression strategies in agricultural systems. Integrating cover crops (CC) offers multiple benefits, including soil erosion reduction, weed suppression, decreased nitrogen requirements, and enhanced carbon sequestration, all of which are closely tied to the aboveground biomass (AGB) they produce. However, biomass production varies significantly due to microsite variability, making accurate estimation and mapping essential for identifying zones of poor weed suppression and optimizing targeted management strategies. To address this challenge, developing a comprehensive CC map, including its AGB distribution, will enable informed decision-making regarding weed control methods and optimal application rates. Manual visual inspection is impractical and labor-intensive, especially given the extensive field size and the wide diversity and variation of weed species and sizes. In this context, optical imagery and Light Detection and Ranging (LiDAR) data are two prominent sources with unique characteristics that enhance AGB estimation. This study introduces a ground robot-mounted multimodal sensor system designed for agricultural field mapping. The system integrates optical and LiDAR data, leveraging machine learning (ML) methods for data fusion to improve biomass predictions. The best ML-based model for dry AGB estimation achieved a coefficient of determination value of 0.88, demonstrating robust performance in diverse field conditions. This approach offers valuable insights for site-specific management, enabling precise weed suppression strategies and promoting sustainable farming practices.


Data Augmentation via Diffusion Model to Enhance AI Fairness

Blow, Christina Hastings, Qian, Lijun, Gibson, Camille, Obiomon, Pamela, Dong, Xishuang

arXiv.org Artificial Intelligence

AI fairness seeks to improve the transparency and explainability of AI systems by ensuring that their outcomes genuinely reflect the best interests of users. Data augmentation, which involves generating synthetic data from existing datasets, has gained significant attention as a solution to data scarcity. In particular, diffusion models have become a powerful technique for generating synthetic data, especially in fields like computer vision. This paper explores the potential of diffusion models to generate synthetic tabular data to improve AI fairness. The Tabular Denoising Diffusion Probabilistic Model (Tab-DDPM), a diffusion model adaptable to any tabular dataset and capable of handling various feature types, was utilized with different amounts of generated data for data augmentation. Additionally, reweighting samples from AIF360 was employed to further enhance AI fairness. Five traditional machine learning models-Decision Tree (DT), Gaussian Naive Bayes (GNB), K-Nearest Neighbors (KNN), Logistic Regression (LR), and Random Forest (RF)-were used to validate the proposed approach. Experimental results demonstrate that the synthetic data generated by Tab-DDPM improves fairness in binary classification.


Enhancing LLM Fine-tuning for Text-to-SQLs by SQL Quality Measurement

Sarker, Shouvon, Dong, Xishuang, Li, Xiangfang, Qian, Lijun

arXiv.org Artificial Intelligence

Text-to-SQLs enables non-expert users to effortlessly retrieve desired information from relational databases using natural language queries. While recent advancements, particularly with Large Language Models (LLMs) like GPT and T5, have shown impressive performance on large-scale benchmarks such as BIRD, current state-of-the-art (SOTA) LLM-based Text-to-SQLs models often require significant efforts to develop auxiliary tools like SQL classifiers to achieve high performance. This paper proposed a novel approach that only needs SQL Quality Measurement to enhance LLMs-based Text-to-SQLs performance. It establishes a SQL quality evaluation mechanism to assess the generated SQL queries against predefined criteria and actual database responses. This feedback loop enables continuous learning and refinement of model outputs based on both syntactic correctness and semantic accuracy. The proposed method undergoes comprehensive validation on the BIRD benchmark, assessing Execution Accuracy (EX) and Valid Efficiency Score (VES) across various Text-to-SQLs difficulty levels. Experimental results reveal competitive performance in both EX and VES compared to SOTA models like GPT4 and T5.


Enhancing Deep Knowledge Tracing via Diffusion Models for Personalized Adaptive Learning

Kuo, Ming, Sarker, Shouvon, Qian, Lijun, Fu, Yujian, Li, Xiangfang, Dong, Xishuang

arXiv.org Artificial Intelligence

In contrast to pedagogies like evidence-based teaching, personalized adaptive learning (PAL) distinguishes itself by closely monitoring the progress of individual students and tailoring the learning path to their unique knowledge and requirements. A crucial technique for effective PAL implementation is knowledge tracing, which models students' evolving knowledge to predict their future performance. Based on these predictions, personalized recommendations for resources and learning paths can be made to meet individual needs. Recent advancements in deep learning have successfully enhanced knowledge tracking through Deep Knowledge Tracing (DKT). This paper introduces generative AI models to further enhance DKT. Generative AI models, rooted in deep learning, are trained to generate synthetic data, addressing data scarcity challenges in various applications across fields such as natural language processing (NLP) and computer vision (CV). This study aims to tackle data shortage issues in student learning records to enhance DKT performance for PAL. Specifically, it employs TabDDPM, a diffusion model, to generate synthetic educational records to augment training data for enhancing DKT. The proposed method's effectiveness is validated through extensive experiments on ASSISTments datasets. The experimental results demonstrate that the AI-generated data by TabDDPM significantly improves DKT performance, particularly in scenarios with small data for training and large data for testing.


NEUROSEC: FPGA-Based Neuromorphic Audio Security

Isik, Murat, Vishwamith, Hiruna, Sur, Yusuf, Inadagbo, Kayode, Dikmen, I. Can

arXiv.org Artificial Intelligence

Neuromorphic systems, inspired by the complexity and functionality of the human brain, have gained interest in academic and industrial attention due to their unparalleled potential across a wide range of applications. While their capabilities herald innovation, it is imperative to underscore that these computational paradigms, analogous to their traditional counterparts, are not impervious to security threats. Although the exploration of neuromorphic methodologies for image and video processing has been rigorously pursued, the realm of neuromorphic audio processing remains in its early stages. Our results highlight the robustness and precision of our FPGA-based neuromorphic system. Specifically, our system showcases a commendable balance between desired signal and background noise, efficient spike rate encoding, and unparalleled resilience against adversarial attacks such as FGSM and PGD. A standout feature of our framework is its detection rate of 94%, which, when compared to other methodologies, underscores its greater capability in identifying and mitigating threats within 5.39 dB, a commendable SNR ratio. Furthermore, neuromorphic computing and hardware security serve many sensor domains in mission-critical and privacy-preserving applications.


Comprehensive Validation on Reweighting Samples for Bias Mitigation via AIF360

Blow, Christina Hastings, Qian, Lijun, Gibson, Camille, Obiomon, Pamela, Dong, Xishuang

arXiv.org Artificial Intelligence

Fairness AI aims to detect and alleviate bias across the entire AI development life cycle, encompassing data curation, modeling, evaluation, and deployment-a pivotal aspect of ethical AI implementation. Addressing data bias, particularly concerning sensitive attributes like gender and race, reweighting samples proves efficient for fairness AI. This paper contributes a systematic examination of reweighting samples for traditional machine learning (ML) models, employing five models for binary classification on the Adult Income and COMPUS datasets with various protected attributes. The study evaluates prediction results using five fairness metrics, uncovering the nuanced and model-specific nature of reweighting sample effectiveness in achieving fairness in traditional ML models, as well as revealing the complexity of bias dynamics.


Harnessing FPGA Technology for Enhanced Biomedical Computation

Alici, Nisanur, Inadagbo, Kayode, Isik, Murat

arXiv.org Artificial Intelligence

This research delves into sophisticated neural network frameworks like Convolutional Neural Networks (CNN), Recurrent Neural Networks (RNN), Long Short-Term Memory Networks (LSTMs), and Deep Belief Networks (DBNs) for improved analysis of ECG signals via Field Programmable Gate Arrays (FPGAs). The MIT-BIH Arrhythmia Database serves as the foundation for training and evaluating our models, with added Gaussian noise to heighten the algorithms' resilience. The developed architectures incorporate various layers for specific processing and categorization functions, employing strategies such as the EarlyStopping callback and Dropout layer to prevent overfitting. Additionally, this paper details the creation of a tailored Tensor Compute Unit (TCU) accelerator for the PYNQ Z1 platform. It provides a thorough methodology for implementing FPGA-based machine learning, encompassing the configuration of the Tensil toolchain in Docker, selection of architectures, PS-PL configuration, and the compilation and deployment of models. By evaluating performance indicators like latency and throughput, we showcase the efficacy of FPGAs in advanced biomedical computing. This study ultimately serves as a comprehensive guide to optimizing neural network operations on FPGAs across various fields.


Exploiting FPGA Capabilities for Accelerated Biomedical Computing

Inadagbo, Kayode, Arig, Baran, Alici, Nisanur, Isik, Murat

arXiv.org Artificial Intelligence

This study presents advanced neural network architectures including Convolutional Neural Networks (CNN), Recurrent Neural Networks (RNN), Long Short-Term Memory Networks (LSTMs), and Deep Belief Networks (DBNs) for enhanced ECG signal analysis using Field Programmable Gate Arrays (FPGAs). We utilize the MIT-BIH Arrhythmia Database for training and validation, introducing Gaussian noise to improve algorithm robustness. The implemented models feature various layers for distinct processing and classification tasks and techniques like EarlyStopping callback and Dropout layer are used to mitigate overfitting. Our work also explores the development of a custom Tensor Compute Unit (TCU) accelerator for the PYNQ Z1 board, offering comprehensive steps for FPGA-based machine learning, including setting up the Tensil toolchain in Docker, selecting architecture, configuring PS-PL, and compiling and executing models. Performance metrics such as latency and throughput are calculated for practical insights, demonstrating the potential of FPGAs in high-performance biomedical computing. The study ultimately offers a guide for optimizing neural network performance on FPGAs for various applications.